Iteration of DESEQ2 over: Region level, then treatment comparison
Region level - e.g. Tissue region, indicates all DE data from same tissue and experiment. These are analysed independently from every other region.
Within a region, the treatments are compared.
However, for downstream plotting, the normalised data from all regions is calculated. Particularly important for co-expression related analysis.
To normalise gene expression across all tissues, the ’DESEQ2 - all tissues - FULL - **’ needs to be used.
#Step 1a in pipeline
#import data about columns, that is each column should be metadata about the sample/animal. Particularly include all treatment info and animal ID in columns.
coldata <- data.table::fread(file = "../Sheep_RNAseq/Inputs/coldata_new_names_20201105.csv") %>%
dplyr::mutate(sample_names = paste0(Region, "_", ID, "__", Diet))
#import data about columns, that is each column should be metadata about the sample/animal. Particularly include all treatment info and animal ID in columns.
coldata <- data.table::fread(file = "../Sheep_RNAseq/Inputs/coldata_new_names_20201105.csv") %>%
dplyr::mutate(sample_names = paste0(Region, "_", ID, "__", Diet))
TODO - make it handle when too many names are in coldata compared to cts_all Error: Can’t subset columns that don’t exist. x Columns BONE_8__Feed, BONE_20__Fast, and BONE_15__Refeed don’t exist.
#re-run check
cts_filtered <- check_count_matrix(count_data = cts_all,
colData = coldata,
column_with_col_names = sample_names
)
PASS: All columns in count_data have matching data in colData
PASS: Rownames not detected
PASS: First column name is 'gene_ensembl'
Dimensions of raw counts dataset BEFORE sorting:
genes: 27054 samples: 239
Number of samples in colData: 239
Dimensions of raw counts dataset AFTER sorting:
genes: 27054 samples: 239
test_annot <- annotate_gene_ensembl(cts_filtered, organism = "oaries")
Using organism: oaries
Important - this produces duplicate gene names from annotation - however gene_ensembl remain unique.
This is why it's important to use gene_ensembl for all tasks and annotate output at end.
[1] "duplicate genes in annotation: 1801"
TO DO: make a wrapper for this with specific checks of data / auto matrix
#subset columns to be every third, to make testing quicker
test_data <- test_data[,seq(0,ncol(cts_filtered)-1,4)]
Error: subscript contains out-of-bounds indices
#4. Generate DE results TO DO: - detect all options for ‘top_level_filter’ and parse to map().
DE_out <-
purrr::map(list("LIV"),
auto_generate_DE_results,
se_data = test_data,
top_level_name = Region,
column_of_samples = sample_names,
samples_to_remove = NA,
DESeq2_formula_design = ~Region_Diet,
rowSums_filter = 10, #for dds filtering
results_contrast_factor = Region_Diet,
results_combinations = NA,
use_IHW_filtering = TRUE,
alpha = 0.05,
gene_annotations = test_annot,
export_tables = TRUE,
export_dir = "./outputs/normalised_counts3/")
******************* Start of - LIV *******************
renaming the first element in assays to 'counts'
Warning in DESeq2::DESeqDataSet(se_data0, design = DESeq2_formula_design) :
some variables in design formula are characters, converting to factors
Beginning DESeq analysis...
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Completed DESeq analysis.
Plotting cooks distance...
Cooks distance plot complete.
Generating pairwise DESeq2 results...
[1] "Contrast levels: LIV_HCP.HP.UMEI, LIV_LCP.HP.UMEI, LIV_LCP.LP.UMEI, LIV_HCP.HP.RMEI, LIV_HCP.LP.UMEI"
log2 fold change (MLE): Region_Diet LIV_HCP.HP.UMEI vs LIV_LCP.HP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 82, 0.5%
LFC < 0 (down) : 36, 0.22%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_HCP.HP.UMEI vs LIV_LCP.LP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 47, 0.29%
LFC < 0 (down) : 37, 0.23%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_HCP.HP.UMEI vs LIV_HCP.HP.RMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 236, 1.4%
LFC < 0 (down) : 197, 1.2%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_HCP.HP.UMEI vs LIV_HCP.LP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 41, 0.25%
LFC < 0 (down) : 21, 0.13%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_LCP.HP.UMEI vs LIV_LCP.LP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 2, 0.012%
LFC < 0 (down) : 11, 0.068%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_LCP.HP.UMEI vs LIV_HCP.HP.RMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 7, 0.043%
LFC < 0 (down) : 13, 0.08%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_LCP.HP.UMEI vs LIV_HCP.LP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 2, 0.012%
LFC < 0 (down) : 1, 0.0061%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_LCP.LP.UMEI vs LIV_HCP.HP.RMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 5, 0.031%
LFC < 0 (down) : 14, 0.086%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_LCP.LP.UMEI vs LIV_HCP.LP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 6, 0.037%
LFC < 0 (down) : 1, 0.0061%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
log2 fold change (MLE): Region_Diet LIV_HCP.HP.RMEI vs LIV_HCP.LP.UMEI
out of 16278 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 8, 0.049%
LFC < 0 (down) : 5, 0.031%
outliers [1] : 8, 0.049%
[1] see 'cooksCutoff' argument of ?results
see metadata(res)$ihwResult on hypothesis weighting
Results generated.
Plotting MA plots (default DESeq2 style for QC)...
Finished MA plots.
Plotting p value histograms (for QC)...
Finished plotting p value histograms.
./outputs/normalised_counts/ Directory exists
Normalised tables exported to the sub-directory: ./outputs/normalised_counts/
Preparing data for output...
List output succesfully generated. Contains: dds_wald_object [dds after call to DESeq()], list of DESeq result objects and list of pairwise plots.
******************* END of - LIV *******************
#5. Accessing results TO DO:
DE_out_test$LIV_DESeq2_Output$DESeq2_annot_df
$`LIV_HCP.HP.UMEI vs LIV_LCP.HP.UMEI`
$`LIV_HCP.HP.UMEI vs LIV_LCP.LP.UMEI`
$`LIV_HCP.HP.UMEI vs LIV_HCP.HP.RMEI`
$`LIV_HCP.HP.UMEI vs LIV_HCP.LP.UMEI`
$`LIV_LCP.HP.UMEI vs LIV_LCP.LP.UMEI`
$`LIV_LCP.HP.UMEI vs LIV_HCP.HP.RMEI`
$`LIV_LCP.HP.UMEI vs LIV_HCP.LP.UMEI`
$`LIV_LCP.LP.UMEI vs LIV_HCP.HP.RMEI`
$`LIV_LCP.LP.UMEI vs LIV_HCP.LP.UMEI`
$`LIV_HCP.HP.RMEI vs LIV_HCP.LP.UMEI`
NA
#Workign
#testing how many have real names
x %>%
as.data.frame() %>%
tibble::rownames_to_column(var = "gene_ensembl") %>%
dplyr::left_join(annot, by = c("gene_ensembl")) %>%
dplyr::select(.data$gene_ensembl,
.data$gene_name,
.data$description,
tidyselect::everything()) %>%
arrange(.data$padj)